Snarls, pangenome deconstruction, and read mapping with vg giraffe

GET-A-PAN

Xian Chang & Jean Monlong

06/11/2025

Pangenome graph from assemblies

Built by aligning high-quality genomes, saved as paths through the pangenome.

Human Pangenome Reference Consortium (HPRC)

Liao, Asri, Ebler, et al. Nature 2023

Pangenome

With haplotype paths

Snarls, intuitively

Snarls, formal definition

A snarl is a subgraph bounded by two node sides that are:

  1. Separable: splitting each node into its two node sides separates a subgraph from the graph

Snarls, formal definition

A snarl is a subgraph bounded by two node sides that are:

  1. Separable: splitting the node into its two node sides separates a subgraph from the graph

Snarls, formal definition

A snarl is a subgraph bounded by two node sides that are:

  1. Separable: splitting the node into its two node sides separates a subgraph from the graph

  2. Minimal: there are no nodes within the snarl that are separable with either boundary node side

Snarls, formal definition

A snarl is a subgraph bounded by two node sides that are:

  1. Separable: splitting the node into its two node sides separates a subgraph from the graph

  2. Minimal: there are no nodes within the snarl that are separable with either boundary node side

Chains

A run of consecutive snarls and nodes is called a chain

Snarl decomposition

Snarls and chains can be nested inside of each other.

The nested relationship of snarls and chains is described by the snarl tree.

Snarl decomposition

Snarls and chains can be nested inside of each other.

The nested relationship of snarls and chains is described by the snarl tree.

Snarl decomposition

Snarls and chains can be nested inside of each other.

The nested relationship of snarls and chains is described by the snarl tree.

Snarl decomposition

Snarls and chains can be nested inside of each other.

The nested relationship of snarls and chains is described by the snarl tree.

Netgraphs

Netgraphs are a representation of snarls with their child chains collapsed into a single node

Snarl examples (vg deconstruct)

vcf + graph + decomposition for this graph

Snarl examples (vg deconstruct)

vcf + graph + decomposition for this graph

Snarl examples (vg deconstruct)

vcf + graph + decomposition for this graph

Snarl examples (vg deconstruct)

Trick for getting this snarl decomposition to look better (currently only for the distance index):

vg index -j [graph.dist] -w 6

vcf + graph + decomposition for this graph

Mapping reads with vg giraffe

Short reads

Long reads

Read mapping

Read mapping

Read mapping

Read mapping

Long read giraffe algorithm: Seeding

  1. Seeding with Minimizer Index

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Chaining

  1. Seeding with Minimizer Index
  2. Chaining

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Zip code trees

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees

Long read giraffe algorithm: Alignment

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees
  4. Alignment with GBWT/graph

Long read giraffe algorithm: Alignment

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees
  4. Alignment with GBWT/graph

Long read giraffe algorithm: Alignment

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees
  4. Alignment with GBWT/graph

Long read giraffe algorithm: Alignment

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees
  4. Alignment with GBWT/graph

Long read giraffe algorithm: Alignment

  1. Seeding with Minimizer Index
  2. Zip code tree making with Distance Index
  3. Chaining with Zip Code Trees
  4. Alignment with GBWT/graph

Giraffe results

On the HPRC v2 graph which is x size?

vg graph formats and indexes

Indexes

  1. .gbwt (Graph Burrows Wheeler Transform): haplotype paths
  2. .gg (GBWT Graph): node sequences for a GBWT
  3. .dist (Distance Index): snarl decomposition plus minimum distances
  4. .zipcodes: per-node distance information used by vg giraffe
  5. .min (Minimizer Index): minimizers used by vg giraffe
  6. .gcsa (Generalized Compressed Suffix Array): substring index used by vg map and vg mpmap

Graphs

  1. .gbz (GBWT + GG): the graph induced by the GBWT
  2. .hg (/.vg) (HashGraph): graph format optimized for speed
  3. .pg (/.vg) (PackedGraph): graph format optimized for space efficiency
  4. .xg: older graph format
  5. .vg: protobuf-based graph format

Conclusion

Useful resources